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1.  Introduction  and  Summary 

Research  into  improving  and  measuring  computer  software  reliability 
has  progressed  along  several  different  directions.  Typical  of  these  are 
structured  programming,  proofs  of  correctness  of  programs,  and  the  stochastic 
analysis  of  software  failure  data  [cf.  Amster  and  Shooman  (1975)]. 

In  this  paper  we  shall  focus  attention  on  certain  decision-theoretic 
aspects  of  the  software  reliability  problem.  These  aspects  arise  quite 
naturally  when  we  consider  a statistical  analysis  of  software  failure  data. 

In  particular,  we  shall  develop  a procedure  for  testing  the  hypothesis  that 
a given  software  system  contains  no  frrors,  and  in  the  sequel,  determine  an 
optimal  interval  of  time  for  which  the  software  has  to  be  exercised  in  order 
to  test  this  hypothesis.  The  overall  organization  of  our  paper  is  as  follows. 

In  Section  2,  we  shall  briefly  review  a simple  probabilistic  model 
for  describing  software  failures.  This  model  is  due  to  Jelinski  and  Moranda 
(1972)  and  has  also  been  described  by  Lloyd  and  Lipow  (1977)  p.  516.  In  Sec- 
tion 3 we  shall  present  some  results  pertaining  to  the  estimation  of  the 
parameters  of  the  model  discussed  in  Section  2.  We  shall  also  present  in  Sec- 
tion 3 an  empirical  stopping  rule  which  signals  the  end  of  the  debugging  phase 
for  a given  software  system.  This  stopping  rule  has  been  recently  proposed 
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L by  us  in  the  open  literature  [see  Forman  and  Slngpurw.il  la  (1977)1.  In 

Section  4 we  shall  discuss  a test  of  the  hypothesis  that  the  software  con- 
tains no  more  errors,  and  determine  an  optimal  interval  of  time  for  which 
the  program  has  to  be  exercised  in  order  to  test  the  hypothesis. 

The  material  in  Sections  2 and  3 can  be  regarded  as  expository, 
whereas  the  material  in  Section  4 is  new  and  represents  the  raison  d'etre 
of  this  paper. 

2.  The  Model  by  Jelinski  and  Moranda 

Jelinski  and  Moranda  (1972)  have  proposed  a model  for  describing 
failures  of  computer  software.  Variations  of  this  model  have  been  considered 
by  Shooman  and  others  [Shooman  et  al.  1972a;  Shooman  1972b,  1973]  in  several 
contexts.  The  applicability  of  this  model  for  analyzing  software  failure 
data  from  the  Apollo  program  and  from  a certain  system  of  the  U.S.  Navy 
have  been  discussed  by  Jelinski  and  Moranda.  Other  applications  of  this 
model  have  been  described  by  Moranda  (1975) . 

Let  us  denote  the  initial  error  content  in  a large  software  system, 
such  as  an  operating  system,  by  N ; N is,  of  course,  unknown.  By  assump- 
tion. the  failure  rate  at  any  point  in  time  is  proportional  to  the  residual 
number  of  errors  in  the  software.  Thus,  if  Ti  » T2  » •••  * denote  the  time 

points  at  which  software  errors  are  detected  and  corrected,  then  the  failure 
rate  at  any  time  point  between  and  is  (N-i+l)4>  , where  <p  is 

some  unknown  constant  of  proportionality. 

In  Figure  2.1  we  show  the  behavior  of  the  ..allure  rate  for  this 
"de-eutrophication"  process. 
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3.  Parameter  Estimation  and  an  Empirical  Stopping  Rule 

As  is  discussed  in  detail  by  Forman  and  Singpurwalla  (1977),  the 
estimation  of  N and  $ , and  the  development  of  a stopping  rule  are  based 
upon  an  analysis  of  the  behavior  of  the  likelihood  function  L(<)>,N)  , where 
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n 

L(<J),N)  = II  (N-i+l)iJ)  exp(-(N-i+l)<j>)  . 
i=l 

n 

Let  L($,N)  denote  the  natural  logarithm  of  L(<£,N)  , T = Z t. 

i 

n 

and  k * Z (i-l)t^  . 
i 

Since  N takes  only  integer  values,  the  unique  maximum  likelihood 

A 

estimator  of  N , say  N , is  that  value  of  N which  simultaneously  satisfies 

+ i0 

and  (3.1) 

- -S  [ W]  * *$&)  >-  ° ’ 

Given  N , the  maximum  likelihood  estimator  of  <j>  is  , 

% 

$ = -5—  . (3.2) 

TN-k 

3.1  Properties  of  the  Maximum  Likelihood  Estimators.  The  estimators 

given  by  Equations  (3.1)  and  (3.2)  are  quite  straightforward  to  obtain,  and 
have  been  discussed  by  Jelinski  and  Moranda  (1972)  and  by  Shooman  (1973). 
However,  the  fact  that  when  n is  much  smaller  than  N , N is  highly 
misleading  has  been  grossly  overlooked.  Specifically,  when  the  quantity 
k/T  is  small,  N tends  to  be  unrealistically  large,  and  furthermore,  a slight 
decrease  in  k/T  leads  to  a disproportionately  large  increase  in  N . Thus, 
for  small  values  of  k/T  , the  maximum  likelihood  estimator  of  N , N is 
very  unstable,  and  may  lead  to  erroneous  conclusions. 

As  k/T  becomes  large,  that  is,  if  the  times  between  failures  during 
the  latter  stages  of  testing  are  greater  than  those  during  the  earlier  stages, 
N tends  to  be  close  to  n , the  observed  number  of  failures.  Thus  N £ n 
is  an  indicator  of  the  fact  that  the  program  is  close  to  being  debugged,  and 
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should  therefore  provide  us  with  a stopping  rule.  However,  as  has  been 
pointed  out  by  Forman  and  Singpurwalla  (1977),  it  is  possible  that  N ^ n 

A A 

and  yet  the  true  value  N may  be  far  from  N . Thus  N ^ n is  not 
always  conclusive  of  the  fact  that  the  program  is  close  to  being  debugged 
and,  therefore,  has  to  be  interpreted  with  caution.  For  a more  conclusive 
analysis  we  will  have  to  examine  the  behavior  of  the  likelihood  function  in 
greater  detail.  This  is  discussed  in  the  next  section. 

3.2  The  Relative  Likelihood  Functions  and  a Stopping  Rule.  The 
relative  likelihood  function  of  N , R(N)  is  defined  as 


R(N)  = 


L(N,4>) 


where 


<KN)  = 


TN-k  ' 


Note  that  the  shape  of  R(N)  is  a function  of  n , k , and  T . 

We  will  need  to  compare  R(N)  with  the  normal  relative  likelihood  function 

of  N , R , (N)  defined  as 
normal 

Rnormal  00  = exp  [~  2 (N“N>2/Var (N) ] 

where 


Var(N)  = 


i\N-i+l  / \i  N-i+l/ 


In  the  light  of  our  previous  discussions,  plus  some  Monte  Carlo 
analyses  performed  by  Forman  (1974),  we  shall  propose  the  following  steps 
which  constitute  a stopping  rule. 


1 
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1.  Compute  N , the  maximum  likelihood  estimator  of  N 
using  Equation  (3.1). 

2.  If  N It  n , proceed  to  Step  3;  if  N > > n , observe 

another  failure  interval  t , , and  go  back  to 

n+1 

Step  1 above. 

3.  Compute  R(N)  and  R , (N)  for  various  values 

normal 

of  N , and  see  if  the  plots  of  R(N)  and  R . (N) 

normal 

are  in  good  agreement  with  each  other.  If  the  two  plots 
show  a large  disparity,  then  N is  a misleading  estimator 
of  N . When  this  happens,  observe  another  failure 
interval  tR+^  and  repeat  the  above  steps.  If  the  plots 

of  R(N)  and  Rnorma2  (N)  show  good  agreement,  then 

N £ n is  a good  estimator  of  N and  we  do  not  have  to 

test  the  software  further  to  obtain  t , , . 

n+1 

An  example  illustrating  the  application  of  the  above  stopping  rule  to 
some  real  data  on  software  failures  is  given  in  Forman  and  Singpurwalla  (1977). 

4.  A Test  of  the  Hypothesis  that  the  Software  Contains  No 
Errors  and  an  Optimal  Time  Interval  for  Testing 

Let  us  assume  that  a given  software  has  been  subjected  to  the  debugging 
process  and  that  all  the  steps  of  our  proposed  stopping  rule  have  been  satis- 
factorily undertaken.  A potential  user  of  this  program  will  be  interested  in 
answers  to  the  following  two  questions. 

1.  What  is  the  assurance  that  the  program  contains 
no  more  errors? 

2.  How  much  additional  testing  should  be  done  in  order 
to  achieve  a specified  assurance? 

Clearly,  an  answer  to  the  above  questions  will  be  a function  of  the 
risks  that  the  user  is  willing  to  take,  the  cost  of  testing,  and  the  conse- 
quences of  software  failure  during  its  operation. 
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In  what  follows,  we  shall  attempt  to  answer  the  above  questions  in  a 
quantitative  manner.  We  shall  attempt  to  answer  the  first  question  by  formu- 
lating it  as  a problem  of  testing  hypothesis. 

: k 

4.1  Testing  of  the  Hypothesis.  Let  N denote  the  number  of  errors 
which  are  remaining  in  a debugged  program.  Let  our  null  hypothesis  be  , 
where 

HQ  : N*  = 0 , 

versus  the  alternative  hypothesis  , where 

* 

H^:N  =r,  r=l,  2,...,  . 

In  order  to  test  the  above  hypotheses,  we  shall  exercise  the  software 
for  an  additional  t units  of  time,  and  reject  H if  a failure  is 

Si  (J 

encountered.  Note  that  for  such  a test,  the  probability  of  rejecting  the 

* 

null  hypothesis  when  it  is  true,  is  zero,  since  , when  N = 0 , we  will  not 
encounter  any  failures.  Thus,  for  our  test  the  so-called  Type  I error  is 
zero. 

Let  8 denote  the  power  of  our  test;  that  is,  g is  the  probability 
of  rejecting  the  null  hypothesis  when  it  is  false.  Since 


8 = P[T  < t |N  = r]  =1-  exp(-<j>rt  ) , 

Si  cl 


(4.1) 


the  power  of  the  test  can  be  made  as  large  as  is  desired  by  increasing  t . 
If  the  user  is  willing  to  specify  a 8 , then  we  can  calculate  t 

a 

by  choosing  r = 1 . When  this  is  done,  our  test  procedure  will  have  a power 
of  at  least  8 • 


For  convenience,  we  shall  denote  the  dependence  of  t on  8 by 


t (8)  . 
a 


To  summarize,  our  test  of  hypothesis  will  proceed  along  the  following 


lines. 
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The  user  will  specify  a 8 , the  power  of  the  test,  and  given  S , 
we  shall  determine  t (8)  using  Equation  (A.l).  We  shall  test  the  software 

for  t (B)  period  of  time,  and  accept  H_  if  no  failures  are  encountered 

3 U 

during  that  period;  otherwise,  we  shall  reject  it. 

4.2  Choosing  t^  Based  on  Cost  Considerations.  We  can  also  choose 

the  t discussed  above  based  on  cost  considerations  and  the  mission  time  t 
a m 

Let  C^  be  the  cost  per  unit  time  for  testing  the  software;  suppose 

that  C^  is  a constant.  Let  C^  be  the  cost  incurred  by  the  failure  of  the 

software  during  the  mission  time  t ; C„  is  also  assumed  to  be  a constant. 

m 2 

Later  on,  we  shall  assume  that  C^  changes  with  time.  Three  outcomes  are 
possible: 

i.  The  software  fails  during  the  additional  testing 

time  t , in  which  case  the  total  cost  is 
a 

CLt  0 < t < t 

1 — — a 

where  t is  the  time  at  which  failure  occurs; 


ii.  The  software  does  not  fail  during  the  additional 
testing  time,  but  fails  during  its  operation  at 
some  time  t ; when  this  happens,  the  total  cost  is 


Va  + C2 


t < t < t + t 
a — — a m 


iii.  No  failure  of  the  software  is  encountered;  the 
total  cost  is 


t + t < t . 
a m 
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The  total  expected  cost  is  therefore 

t t + t 

a am 

E(C)  = / C t4>e-?cdt  + / (Cnt  +C  )<fe-<pcdt 
0 1 t 1 a 2 


+ f Clta'*ie  4>tdt 
t +t  1 3 
a m 


In  order  to  solve  for  t , we  shall  minimize  E(C).  Since 

a 


, -<t>t 

T~  E(C)  = e 3 


* ')] 


/ _^tm\ 

we  claim  that  when  > (J^tl-e  I , the  value  of  t^  at  which  E(C) 

is  minimized  is  zero.  That  is,  when  the  cost  of  testing  is  much  larger 
than  the  cost  of  an  operational  failure,  no  additional  testing  is  necessary. 
However,  a potential  user  may  still  wish  to  test  the  program  for  t^CS) 

units  of  time  and  be  assured  that  the  power  of  his  test  is  at  least  8 . 


If,  on  the  contrary 


, Cx  < *C2(l-e  * m)  , 


then  t = " minimizes 
a 


E(C)  . This  means  that  we  should  test  exercise  the  software  for  an  indefinite 
amount  of  time.  Here,  again,  a potential  user  may  test  for  t^CS)  units 

of  time  and  get  an  assurance  that  the  power  of  his  test  is  at  least  3 . 

The  assumption  that  is  a constant  may  not  be  realistic  in  many 

situations.  In  addition  to  this,  this  assumption  may  lead  us  to  the  case 
of  indefinite  testing  as  is  shown  above.  We  shall  now  relax  this  assumption 
and  explore  the  consequences. 

Suppose  that  the  cost  of  testing  is  C^(t)  , where  C^Ct)  is  a 
convex  non-decreasing  function  of  t . Let  us  denote  the  derivative  of  C^(t) 
evaluated  at  t = 0 by  C^(0)  . 
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We  can  now  verify  that  when 


C[(0)  > $C2^l-e  aJ 

t =0  will  minimize  E(C)  . If  this  happens,  we  can  choose  t (8)  as  our 

cl  3. 

additional  test  time. 


If  on  the  contrary 


— (pt 

C^(0)  < <).C2(l-e  m) 


then  t&  = 1j$C2|l-e  m|J  will  minimize  E(C)  ; ^(*)  is  the  inverse 


of  q(-)  . 


As  an  example,  if  C^(t)  = C^e  , for  some  a > 0 , then, 


1 M l1_e 

t = - log  — 

a a C^a 


For  the  above  situation,  if  t >_  t (8)  , and  should  we  test  for 

Si  Si 

t units  of  time,  then  we  will  not  only  be  minimizing  the  total  expected 

Si 

cost,  but  will  also  achieve  a power  of  at  le^st  8 . If  t < t (8)  , 

cl  cl 

I otta(8)  at  \ 

then  C^le  - e I represents  the  increase  in  the  cost  of  testing 

to  achieve  a power  of  at  least  8 . 
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